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PREFACE 


This  paper  was  prepared  by  the  Institute  for  Defense  Analyses  (IDA)  for  the 
Ballistic  Missile  Defense  Organization  (BMDO)  under  a  task  entitled  “Strategic  Defense 
System  Phase  One  Engineering  and  Technical  Support  (POET).”  The  objective  of  the  task 
was  to  provide  analyses  and  recommendations  to  BMDO  for  defining  a  baseline  concept  for 
the  Phase  One  Strategic  Defense  System  (SDS).  In  support  of  that  objective,  IDA  examined 
the  cost  and  schedule  estimates  for  the  software  involved  in  SDS  architectures.  Part  of  that 
work  involved  developing  methods  for  estimating  the  costs  and  schedules  of  SDS 
software.  This  paper  presents  these  methods  and  explains  how  they  where  developed. 

This  work  was  reviewed  by  Beth  Springsteen,  Thomas  P.  Frazier,  and  J.  Richard 
Nelson  of  IDA  and  William  Kuhn  of  the  MITRE  Corporation. 
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EXECUTIVE  SUMMARY 


The  Ballistic  Missile  Defense  Organization  (BMDO)  asked  the  Phase  One 
Engineering  Team  (POET)  to  investigate  the  factors  that  drive  software  development  costs 
and  schedules.  In  particular,  the  BMDO  asked  the  POET1  to  obtain  and  analyze  historical 
data  from  past  space  system  programs  to  identify  the  characteristics  that  influence  software 
development  costs  and  schedules. 

Many  previous  research  efforts  have  shown  that  software  size,  measured  by  source 
lines  of  code,  drives  software  development  costs  and  schedules.  This  report  addresses  the 
factors  that  affect  software  development  costs  and  schedules  for  space  systems.  The  study 
focuses  on  software  for  both  the  embedded  flight  and  ground  segments. 

We  used  data  from  the  Air  Force  Space  and  Missile  Systems  Center  (SMC) 
database.  The  data  in  the  SMC  database  were  collected  from  the  Space  Systems  Cost 
Analysis  Group  (SSCAG)  participants.  The  SSCAG  is  an  industry  and  government  group 
formed  to  enhance  space  system  cost  analysis.  Another  data  source  was  the  NASA 
Goddard  Space  Flight  Center  database. 

We  evaluated  and  normalized  the  databases  into  a  consistent  format.  We  used  data 
at  the  level  of  the  computer  software  configuration  item  (the  next  level  of  information  down 
from  a  software  project).  The  data  were  segregated  by  basing  mode,  software  type, 
mission  equipment  (manned  and  unmanned),  and  user  (DoD  and  NASA).  Multiple 
regression  was  used  to  develop  cost-  and  time-estimating  relationships. 

In  analyzing  software  costs,  we  examined  the  following  cost  drivers:  size,  basing 
mode  (ground  and  flight),  software  type,  mission  equipment  type,  and  user.  The  schedule 
analysis  examined  size  and  staffing  level  as  schedule  drivers.  The  following  is  a  summary 
of  our  findings. 

•  Cost: 

-  Software  size  is  still  a  good  predictor  of  software  development  cost. 
However,  software  type  (application  and  support)  and  user  (DoD  and 
NASA)  are  also  important  cost  factors. 


*  POET  is  a  conglomerate  of  Federally  Funded  Research  and  Development  Centers  (including  IDA)  that 
supports  BMDO. 


S-l 


Ground  segment  support  software  costs  about  20%  to  25%  less  to 
develop  than  application  software. 

The  DoD’s  software  development  costs  are  higher  than  NASA’s — about 
60%  more  for  the  ground  segment  and  40%  more  for  embedded  flight 
software. 

The  DoD’s  embedded  flight  software  development  costs  are  on  average 
five  times  higher  than  ground  segment  software  costs. 

Embedded  flight  software  development  exhibits  more  diseconomies  of 
scale  than  does  ground  segment  software  development. 

•  Schedule: 

-  Software  size,  staffing  level,  and  basing  modes  are  the  drivers  of  software 
development  schedules. 

-  Adding  staff  shortens  software  development  duration  at  a  decreasing  rate, 
because  inefficiency  is  a  by-product  of  larger  staff  sizes. 

We  make  the  following  recommendations  to  improve  the  BMDO’s  software  cost 
and  schedule  estimating. 

•  Examine  the  effects  of  the  Ada  programming  language  on  software  size,  cost 
and  schedule.  (Our  study,  which  did  not  include  Ada  programs,  could  be 
updated  when  Ada  data  points  arc  available.) 

•  Examine  historical  data  on  ground-based  battle  management  and  command, 
control,  and  communications  programs.2  (Our  study  did  not  include  data 
points  from  programs  of  this  nature.) 


2  The  BMDO’s  Command  and  Control  Element  (C2E)  is  the  most  software-intensive  element.  It 
performs  battle  management  and  command,  control,  and  communications  functions  that  are 
unprecedented  in  terms  of  functionality  and  complexity. 
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I.  INTRODUCTION 


A.  BACKGROUND 

Department  of  Defense  (DoD)  software  expenditures  for  weapon  systems  have 
grown  tremendously  during  the  last  twenty  years  [1].  Weapon  systems  in  the  early  1970s 
did  not  have  any  software,  while  current  systems  have  over  one  million  source  lines  of 
code  (SLOC). 

When  this  research  effort  began  in  1992,  the  estimate  for  the  total  size  of  software 
in  the  elements  of  the  former  Strategic  Defense  Initiative  Organization  (SDIO)  was  over  ten 
million  SLOC.  [The  SDIO  elements  in  1992  were  Brilliant  Pebbles,  Brilliant  Eyes, 
Ground-based  Radar,  Battle  Management  and  Command,  Control,  and  Communications 
(BM/C3),  etc.]  SDIO  is  now  the  Ballistic  Missile  Defense  Organization  (BMDO),  and 
although  its  mission  has  changed  from  strategic  to  tactical  missile  defense,  software  is  still 
an  important  part  of  the  system. 

Two  offices  within  BMDO  oversee  software  size  and  cost  estimation.  The  System 
Engineering  and  Integration  Directorate  is  responsible  for  reviewing  the  software  size 
estimates  provided  by  the  various  BMDO  element  project  offices.  These  software  size 
estimates  are  included  in  the  Cost  Analysis  Requirements  Documents  (CARD).  The  Cost 
Estimating  and  Analysis  Directorate  is  responsible  for  estimating  software  costs  and 
schedules  from  information  contained  in  the  CARDs. 

To  perform  these  analyses,  analysts  must  identify  and  understand  the  technical 
parameters  that  influence  the  cost  of  future  software-intensive  BMDO  elements.  Insights 
into  these  relationships  permit  independent  assessments  of  software  estimates  provided  by 
the  element  project  offices. 

SDIO  asked  the  Phase  One  Engineering  Team  (POET)  to  investigate  technical 
parameters  that  drive  software  development  size,  cost,  and  schedule  for  future  space-based 
systems.  The  POET,  a  conglomerate  of  Federally  Funded  Research  and  Development 
Centers  that  supports  BMDO,  performed  similar  work  in  1990  where  software 
development  costs  were  found  to  vary  significantly  by  the  accommodating  hardware 
location  (ground,  air,  and  space)  [2]. 
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As  part  of  POET’s  effort,  the  Institute  for  Defense  Analyses  (IDA)  studied  software 
development  costs  and  schedules,  and  The  Aerospace  Corporation  studied  software  size. 
This  paper  documents  IDA’s  portion  of  the  overall  POET  effort.  The  entire  POET  effort, 
including  the  contributions  by  analysts  of  The  Aerospace  Corporation,  is  documented  in  a 
separate  report  [3]. 

B.  APPROACH 

The  focus  of  the  study  was  to  analyze  existing  databases  that  contain  robust 
samples  of  historical  software  development  efforts.  First,  we  analyzed  the  Space  Systems 
Cost  Analysis  Group  (SSCAG)  database  [4].  The  SSCAG,  sponsored  by  the  U.S.  Air 
Force  Space  and  Missile  Systems  Center  (SMC)  and  the  National  Aeronautics  and  Space 
Administration  (NASA),  is  a  government  and  industry  working  group  formed  to  advance 
space  systems  cost  analysis.  The  SSCAG  database  contains  software  development 
information  of  past  programs  submitted  by  contractors  and  data  collected  by  SMC  and 
NASA.  Management  Consulting  and  Research,  Incorporated,  maintains  the  SSCAG 
database  for  SMC.  We  also  used  the  NASA  Goddard  Space  Flight  Center  [5]  software 
development  database.  This  database  contains  data  that  do  not  overlap  the  SSCAG 
database. 

We  developed  cost-estimating  relationships  (CERs)  at  the  level  of  the  computer 
software  configuration  item  (CSCI).  We  developed  separate  CERs  for  ground  segment  and 
embedded  flight  software.  We  also  investigated  cost  drivers  based  on  the  mission 
equipment  type  (unmanned  and  manned),  software  type  (application,  support,  and 
operating  system),  and  user  (DoD  and  NASA).  To  estimate  software  schedule  duration,  we 
developed  time-estimating  relationships  (TERs). 

C.  SCOPE 

We  examined  software  size  for  DoD  and  NASA  space  missions  for  both  embedded 
flight  and  ground  segment  software.  The  CERs  and  TERs  developed  estimate  software 
development  efforts  from  product  design  through  CSCI  integration  and  test.  Factors  are 
available  to  estimate  the  other  activities  (system  requirements  and  system-level  integration 
and  tests)  not  addressed  by  the  CERs  and  TERs  in  our  study.  Independent  verification  and 
validation  before  system  deployment  is  also  not  included  in  our  analysis. 

The  programs  included  in  our  database  used  the  following  non-Ada  programming 
languages:  FORTRAN,  Assembly,  Jovial,  and  PL1. 
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The  methods  derived  from  our  analysis  could  be  used  to  estimate  software 
development  costs  and  schedules  of  a  satellite  system.  The  data  used  in  this  study  did  not 
include  command  and  control  systems  with  intensive  BM/C3  functions. 

D .  REPORT  ORGANIZATION 

This  report  is  divided  into  four  chapters.  Following  this  introduction  (Chapter  I), 
Chapter  II  documents  the  development  of  the  CERs.  Chapter  in  describes  the  schedule 
analysis,  and  Chapter  IV  summarizes  the  findings  of  our  research  and  presents  a  list  of 
recommendations  to  improve  BMDO  software  cost  estimation.  Appendix  A  presents 
examples  of  application  of  the  models  presented  in  the  text,  and  Appendix  B  describes  the 
analysis  of  residuals  to  eliminate  outliers  in  the  data. 


1-3 


H.  COST  ESTIMATION 


A.  INTRODUCTION 

This  chapter  documents  the  CERs  developed  for  embedded  flight  and  ground 
segment  software.  It  first  describes  the  method  used  to  derive  the  CER  and  then  discusses 
the  data  sources  and  data  sample.  Section  D  addresses  the  data  evaluation  and  normalization 
process.  Section  E  explains  the  notion  of  equivalent  source  lines  of  code  (ESLOC),  and 
Section  F  discusses  the  CER  development  process.  Section  G  presents  the  CERs  and 
results.  Section  H  provides  factors  to  be  used  to  account  for  activities  not  included  in  the 
CERs.  Finally,  Section  I  explores  issues  surrounding  BMDO  software  cost  estimating. 

B.  METHOD 

The  method  used  to  develop  the  CERs  to  estimate  software  development  cost 
followed  previous  work  by  IDA  for  BMDO  [4].  The  basic  framework  traditionally  used  to 
derive  software  development  CERs  assumes  that  costs  are  related  to  software  size  in  an 
exponential  form  [6]: 

Effort  *  A  x  (Size)8.  (II- 1 ) 

In  this  equation,  effort  is  a  measure  of  the  number  of  man-months  required  to  develop  the 
software.1  The  coefficient  A  is  the  intercept  term  derived  through  a  log  transformation 
regression.  The  input,  size,  is  measured  by  the  number  of  source  lines  of  code.  The 
exponent  B  is  derived  from  the  regression  analysis. 

In  addition  to  the  traditional  size  cost  driver,  we  examined  the  type  of  equipment 
used  for  the  mission  (manned  and  unmanned  space  systems)  and  the  software  type 
(application,  support,  and  operating  system)  as  potential  cost  drivers.  Cost  differences 
associated  with  the  different  users  (DoD  and  NASA)  were  also  investigated  because  DoD 
programs  tend  to  be  operational  with  long  lifetimes  while  NASA  programs  tend  to  be 
experimental  and  short-lived. 


1  Software  coat  is  measured  in  man-montha  of  effort,  which  include*  management,  design,  programming, 
test  and  simulation,  training,  and  databaae  administration. 
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We  analyzed  the  residuals  to  eliminate  outliers  that  influenced  the  fit  of  the 
regression  function  in  our  model.  The  methods  used  were  Hat  Diagonal  Matrix, 
RSTUDENT,  and  Cook’s  Distance  analyses  (see  Appendix  B). 

C.  DATABASE 

The  data  in  our  analysis  came  from  three  sources:  the  Space  Systems  Cost  Analysis 
Group  (SSCAG)  [4],  the  Jet  Propulsion  Lab  (JPL)  [7],2  and  the  Goddard  Space  Flight 
Center  [5].  The  data  in  the  SSCAG  database  contains  software  used  in  various  space 
programs,  including  the  Space  Shuttle,  and  both  ground  segment  and  embedded  flight 
software.  It  contains  data  from  22  member  companies,  including  the  Space  and  Missile 
Systems  Center  (SMC).  The  SSCAG  database  contains  software  development  programs  at 
CSCI  and  project  levels  were  measured  in  source  lines  of  code  and  in  nun-months.  NASA 
data  included  in  the  SSCAG  database  were  from  JPL  programs  and  the  Space  Shuttle 
program. 

The  Goddard  database  comprises  flight  and  ground  segment  software  for  space 
programs  managed  by  Goddard.  Except  for  the  Spacelab  data  points,  all  of  the  Goddard 
data  are  from  unmanned  space  missions  such  as  the  High  Energy  Astrophysical 
Observatory,  Solar  Max,  and  the  Earth  Radiation  Budget  Satellite. 

D.  DATA  EVALUATION  AND  NORMALIZATION 

We  evaluated  each  database  for  data  content  and  possible  overlap.  Each  data  point 
was  checked  for  correct  basing  mode  (where  the  software  resides),  software  type 
(application,  support,  operating  system),  and  size.  The  data  points  that  did  not  have  all 
development  phases  (preliminary  design  through  CSCI  integration  and  test,  as  shown  in 
Figure  H-l)  were  excluded.  We  also  verified  the  information  in  the  database  through 
meetings  with  SMC  and  Management  and  Consulting  Research,  Incorporated,  and  review 
of  source  documents  [S,  6,  and  8].  Only  actual  values  were  used  for  the  analysis  (estimated 
values  were  excluded). 

First,  we  divided  the  data  into  two  basing  modes:  ground  segment  and  embedded 
flight.  Within  each  basing  mode  the  data  were  further  classified  by  mission  type  (unmanned 
space  and  manned  space  missions).  We  then  segregated  the  data  into  three  software  types 
(application,  support,  and  operating  system)  to  test  the  hypothesis  that  support  software  is 
the  least  expensive  of  the  three.  Figure  EI-2  depicts  our  scheme  for  classifying  the  data. 


2  The  JPL  database  is  included  in  the  SSCAG  database. 
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Figure  11-1.  Software  Development  Phases 
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Figure  11-2.  Database  Classification 


Application  software  is  specific  to  satellite  or  payload  missions.  Such  software  is 
critical  to  the  mission  and  requires  a  high  degree  of  real-time  processing.  Real-time 
processing  provides  output  that  does  not  delay  the  user  or  process  [9].  This  includes  signal 
processing,  mission  control,  command,  control,  and  communications,  and  so  on. 

Support  software  supports  the  mission  and  is  not  critical  to  basic  operation. 
Support  software  could  be  considered  ‘‘off-line”  because  it  is  not  required  for  real-time 
processing  in  order  to  complete  a  mission.  This  includes  post  processing,  simulation, 
training,  database  management,  maintenance,  test,  and  so  on.  The  difference  between 
application  and  support  software  is  the  degree  of  real-time  processing  required  to  execute 
the  task. 
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Operating  system  software  manages  the  hardware  resources,  including  computer  or 
system  operations.  It  is  designed  to  operate,  maintain,  and  control  specific  computer 
equipment. 

Our  original  sample  included  253  data  points,  224  for  the  ground  segment  and  29 
for  embedded  flight.  The  ground  segment  data  included  171  unmanned  and  53  manned 
missions.  The  distribution  of  application  and  support  ground  segment  software  was  nearly 
equal.  Few  data  points  for  operating  system  software  were  in  the  database.  The  embedded 
flight  software  included  17  unmanned  missions  and  12  manned  missions.  Almost  all  of  the 
embedded  flight  software  was  classified  as  application  software. 

Later  in  the  study,  we  decided  to  analyze  only  data  for  the  unmanned  space  mission 
systems  for  two  reasons:  (1)  the  BMDO’s  main  interest  was  in  unmanned  space  missions, 
and  (2)  after  we  performed  residuals  analysis  to  eliminate  questionable  data  points 
(reducing  the  number  of  data  points  from  253  to  145),  too  few  manned  space  mission  data 
points  remained  in  the  database. 

The  ground  segment  data  for  our  reduced  data  set  included  136  unmanned 
missions.  The  DoD  data  points  accounted  for  6%  of  the  unmanned  missions.  The 
segregation  between  application  and  support  ground  segment  software  was  53%  and  46% 
relatively.  Only  1.5%  of  the  data  points  were  operating  system  software.  The  embedded 
flight  software  for  our  reduced  data  set  included  only  9  unmanned  missions.  All  of  the 
embedded  flight  data  points  were  application  software. 

The  sources  and  numbers  of  data  points  for  the  unmanned  space  ground  segment 
and  embedded  flight  software  are  shown  in  Table  11-1. 


Table  IM.  Sources  and  Numbers  of  Cost  Data  Points  by  Software  Type 

SSCAG  Data _ 

Software  Type _ DoD  Mission 


Ground  Segment 

Application  9 

Support  0 

Operating  System  _ 0 

Total  9 

Embedded  Flight 

Application  2 

Support  0 

Operating  System  0 

Total  2 


NASA  Mission  Goddard  Data _ Total 


16  47  72 

35  27  62 

J _  1  2 

52  75  136 

3  4  9 

0  0  0 

0  0  _ 0 

3  4  9 


E.  EQUIVALENT  SOURCE  LINES  OF  CODE 


Since  software  size  is  the  primary  driver  of  software  cost,  a  convention  to  account 
for  the  true  size  of  a  CSCI,  including  reused  and  modified  software,  is  essential.  We 
measured  software  size  in  source  lines  of  code.  We  included  data  declarations,  job-control 
language,  files,  tests,  simulations,  and  training,  but  excluded  comment  lines,  commercial 
off-the-shelf  software,  and  in-house  software.  We  adjusted  the  software  size  for  whether 
the  code  was  reused  and/or  modified  using  the  method  documented  in  [7]: 

ESLOC  =  New  SLOC  +  0.5  Modified  SLOC  +  0.25  Inherited  SLOC. 

The  term  ESLOC  is  equivalent  source  lines  of  code,  New  SLOC  is  newly  developed  code. 
Inherited  SLOC  is  synonymous  to  reused  code,  and  Modified  SLOC  is  between  new  and 
reused  code. 

F.  CER  DEVELOPMENT 

We  analyzed  the  ground  segment  software  development  cost  for  the  application  and 
operating  system  software  separately  from  the  support  software.  We  also  developed  a  CER 
for  ground  segment  software  for  DoD  users  only.  The  embedded  flight  CER  contains  only 
application  software. 

Development  cost,  measured  in  man-months  (MM)  of  effort,  is  the  dependent 
variable  in  the  multiple  regression  analyses.  Candidate  cost-driving  variables  are: 

•  Size  measured  in  thousands  of  equivalent  source  line  of  code  (EKSLOC). 

•  Embedded  flight  software  indicator  variable  (FLT).  This  1/0  indicator  variable 
has  a  value  of  1  for  embedded  flight  software  and  a  value  of  0  otherwise. 

•  Application  software  indicator  (APP).  This  1/0  indicator  has  a  value  of  1  for 
application  software  and  a  value  of  0  otherwise. 

•  Operating  system  software  indicator  (SYS).  This  1/0  indicator  has  a  value  of  1 
for  operating  system  software  and  a  value  of  0  otherwise. 

•  Support  software  indicator  (SUP).  This  1/0  indicator  has  a  value  of  1  for 
support  software  and  a  value  of  0  otherwise. 

•  Department  of  Defense  user  indicator  (DOD).  This  1/0  indicator  has  a  value  of 
1  for  DoD  users  and  a  value  of  0  otherwise. 

Our  CERs  take  on  the  intrinsically  linear  multiplicative  form  Y=  A  x  B  as  shown  in 
Equation  11-1.  To  estimate  the  coefficient  of  this  equation,  we  transformed  the  equation  to  a 
logarithmic  form  and  then  applied  ordinary  least  square  linear  regression.  When  the 
equation  is  transformed  from  the  logarithmic  form  back  to  a  multiplicative  form,  the 
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multiplicative  residuals  are  assumed  to  be  distributed  log  normally.  As  the  log  normal 
distribution  is  right-skewed,  the  expected  value  and  most  likely  value  (mode)  of  the 
residuals  are  no  longer  equal.  Therefore,  an  adjustment  must  be  made  for  the  multiplicative 
form  to  yield  the  expected  value  for  the  dependent  variable.  We  made  this  adjustment  by 
adding  one-half  of  the  regression  mean  square  error  to  the  constant  term  of  the  logarithmic 
equation  before  it  is  transformed  into  the  multiplicative  form  [10].  We  then  transformed  the 
intercept  term  into  a  multiplicative  constant,  which  yields  an  adjustment  factor  (adjusted 
constant  term/unadjusted  constant  term)  on  the  multiplicative  form  greater  than  one.  In 
reporting  the  estimating  relationships,  we  report  the  adjusted  multiplicative  equation  along 
with  the  factor  so  that  the  equation  can  be  back-adjusted  to  yield  the  most-likely  value. 

G.  RESULTS 

We  developed  cost-estimating  relationships  for  both  ground  segment  and  embedded 
flight  space  mission  software  CSCIs.  For  the  ground  segment,  we  analyzed  the  application 
and  operating  system  software  separately  from  the  support  software.  For  the  embedded 
flight  software,  we  developed  only  one  CER  because  the  data  were  comprised  of 
application  software  only. 

1 .  Space  Mission  Ground  Segment  CERs 

First,  we  analyzed  the  application  and  operating  system  software  using  a  multiple- 
input  CER  with  size  (EKSLOC)  and  user  type  (NASA  or  DOD)  variables.  Then  we 
examined  the  support  software  using  a  single-input  CER  with  size  (EKSLOC)  as  the 
independent  variable. 

a.  Application  and  System  Software  CER 

The  application  and  operating  system  software  CER  for  ground-based  CSCIs  used 
for  unmanned  space  missions  is  presented  in  Equation  II-2. 

MM  =  4.3  x  EKSLOC  » •«  x  1.57  000  (D-2) 

(30.9.  .000)  (3.7,  .000) 

N  =  74  Adjusted  R2  =  0.93  SEE  *63  Intercept  Adjustment*  1.06 
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The  t-scores  and  probability  levels  are  in  parentheses  below  the  parameter 
estimates.3  N  is  the  number  of  observations.  The  adjusted  R2  indicates  93%  of  the 
variation  in  MM  can  be  explained  by  the  single  variable  in  Equation  II-2  in  log  transformed 
space.4 5  SEE,  the  standard  error  of  the  estimate,  is  in  the  dimension  of  the  independent 
variable  and  indicates  better  fit  for  smaller  SEE  values.3 

The  sample  average  of  the  independent  variable  (EKSLOC)  in  Equation  II-2  is 
30.8.  The  range  of  EKSLOC  is  0.8  to  160.5. 

That  the  EKSLOC  coefficient  is  greater  than  one  in  Equation  II-2  indicates  that  cost 
will  increase  at  a  greater  rate  as  size  increases.  Equation  11-2  indicates  that  DoD  application 
and  operating  system  software  for  the  ground  segment  is  57%  more  expensive  than  NASA 
software.  This  may  have  to  do  with  the  fact  that  DoD  systems  in  our  database  are  different 
than  NASA  programs  in  terms  of  software  development  and  documentation  standards  and 
operational  requirements. 

b.  Support  Software  CER 

The  support  software  CER  for  ground-based  CSCIs  used  in  unmanned  space 
missions  is  shown  in  Equation  II-3. 

MM  *  4.7  x  EKSLOC  °-98  (H-3) 

(31.9,  .000) 

N  =  62  Adjusted  R2  =  0.95  SEE  =  40.2  Intercept  Adjustment  =  1 .07 

Equation  H-3  indicates  that  the  support  software  development  effort  increases  at  a 
decreasing  rate  as  the  size  increases,  implying  that  there  are  no  diseconomies  of  scale  in 
support  software.  This  outcome  is  intuitive  because  support  software  may  not  have  the 
complex  interfaces  that  tend  to  make  large  application  CSCIs  more  expensive. 


3  The  t-score  is  the  statistic  that  tests  the  null  hypothesis  that  the  coefficient  B  in  Equation  II-l  is  equal 
to  zero  against  the  alternative  hypothesis  that  B  is  not  equal  to  zero  (11].  The  t- score  is  the  ratio  of  the 
regression  coefficient  to  its  standard  error.  A  t-score  of  about  2.0  implies  >95%  confidence  that  the 
coefficient  is  significant.  Higher  t-scores  imply  greater  confidence  in  the  coefficient  significance.  An 
analogy  to  this  statistic  might  be  the  signal-to-noise  ratio.  The  probability  level  statistic  shows  the 
confidence  level  that  the  estimated  coefficient  is  equal  to  zero.  Lower  probability  values  indicate  greater 
statistical  significance. 

4  The  R2  is  a  measure  of  the  fit  of  a  regression  equation.  An  adjustment  is  made  to  lessen  the  effect  of 
increasing  the  R2  value  through  the  addition  of  independent  variables.  The  adjusted  R2  modifies  the  R2 
to  penalize  the  model  containing  additional  variables  when  compared  with  alternative  regression  models 
(12,  p.  365].  An  R2  of  1.00  indicates  a  perfect  fit 

5  Reference  (12,  p.  1 18]. 
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c.  DoD  Ground  Segment  CER 

For  the  DoD  user,  we  developed  a  separate  regression  model  to  estimate  the  cost  of 
ground  segment  software  for  military  space  missions  for  application  software  only.  The 
model  is  shown  in  Equation  H-4. 

MM  =  7.2  x  EKSLOC  104  (H-4) 

(10.6,  .000) 

N  =  9  Adjusted  R2  =  0.97  SEE  =  53  Intercept  Adjustment  =  1.007 

The  average  of  the  independent  variable  (EKSLOC)  in  Equation  H-4  is  38.1.  The 
range  of  EKSLOC  is  6.8  to  116.8. 

Because  of  the  smaller  sample  size,  the  adjusted  R2  improved  to  97%.  However, 
software  type  was  not  a  cost  driver.  The  intercept  (A  in  Equation  II- 1)  in  Equation  H-4 
(7.2)  is  67%  higher  than  the  multiplier  in  Equation  II-2  (4.3).  This  is  consistent  with  the 
estimate  for  the  DoD  dummy  variable  parameter  (1.57)  in  Equation  DL-2. 

The  exponent  of  the  variable  EKSLOC  in  both  Equations  II-2  and  II-4  are  greater 
than  one.  This  suggests  diseconomies  of  scale  and  is  in  accordance  with  conventional 
CERs.  However,  CERs  have  been  developed  that  show  economies  of  scale  [13,  14,  and 
15].  Given  the  description  in  Reference  [7],  our  ground  segment  CER  could  apply  to  semi¬ 
detached  CSCIs. 

2.  Space  Mission  Embedded  Flight  CER 

Due  to  insufficient  data  for  the  embedded  flight  software,  we  combined  the 
operating  system  and  application  software  data  for  the  CER  analysis.  To  develop  the 
embedded  flight  CER,  we  tested  all  the  cost  drivers  that  were  used  for  the  ground  segment 
CERs.  However,  only  software  size  and  DoD  user  proved  to  be  significant  variables,  as 
shown  in  Equation  II-5. 

MM  =  8.3  x  EKSLOC  147  x  1.38  d°d  (II-5) 

(22.84,  .000)  (2.75,  .033) 

N  =  9  Adjusted  R2  =  0.988  SEE  =  58  Intercept  Adjustments  1.001 

The  average  of  the  independent  variable  EKSLOC  for  unmanned  missions  in 
Equation  II-5  is  13  and  the  range  is  3  to  32. 

Equation  II-5  indicates  that  DoD  embedded  flight  software  costs  38%  more  than 
NASA  software.  That  the  EKSLOC  coefficient  in  Equation  D-5  (1.47)  is  greater  than  in 
Equation  n-2  (1.08)  implies  more  diseconomies  of  scale  in  embedded  flight  software 


development  than  in  ground  segment  software  development.  The  multiplier  in  Equation  II-5 

(8.3)  is  about  two  times  larger  than  the  ground  segment  CER  multiplier  in  Equation  H-2 

(4.3) ,  which  also  indicates  higher  cost  in  the  embedded  flight  software  development 
compared  with  ground  segment  software.  This  higher  cost  in  embedded  flight  software  can 
be  attributed  to  a  high  degree  of  real-time  processing,  ultra-high  reliability,  interfaces  with 
other  equipment  besides  computer  hardware,  and  computer  hardware  obsolescence6  [16 
and  17].  These  complexity  factors  may  also  explain  the  diseconomies  of  scale  associated 
with  this  type  of  software. 

When  doing  cost  estimates  using  parametric  relationships,  analysts  must  understand 
the  relevant  range  of  the  data  with  which  the  relationships  were  developed.  The  ranges  of 
ESLOC  (CSCIs  level)  used  tor  our  models  are  provided  in  Table  II-2. 


Table  11-2.  ESLOC  Ranges 


ESLOC 

Minimum  Value 

Average  Value 

Maximum  Value 

Ground  Application 
and  Operating  System 

400 

26,000 

192,700 

Ground  Support 

800 

31,00 

160,500 

Embedded  Flight 

3,000 

13,000 

32,000 

H.  EFFORT  NOT  ACCOUNTED  FOR  BY  THE  CERs 

As  mentioned  previously,  our  database  captures  cost  from  product  design  through 
CSCI  integration  and  test.  This  does  not  include  planning,  requirements  analysis,  system 
integration,  and  test  activities.  Analysts  should  use  factors  associated  with  their  programs 
when  adjusting  the  CERs.  If  program  data  are  not  available,  use  the  effort  phase 
distribution  shown  in  Table  II-3.  Analysts  can  adjust  the  CERs  to  include  the  missing 
phases  by  multiplying  the  CERs  by  the  factors  shown  in  the  table. 

Independent  verification  and  validation  (when  an  independent  contractor  or  agency 
verifies  and  validates  software  being  developed  by  another  contractor  or  agency)  is  also  not 
accounted  for  in  the  CERs.  According  to  [6],  this  additional  effort  could  range  from  20%  to 
40%  of  the  development  effort,  depending  on  the  reliability  of  the  software.  This  factor 
should  be  applied  only  to  flight  CSCIs  and  mission  critical  ground  CSCIs. 


6  Programmers  must  use  space-qualified  computer  hardware.  Because  it  is  time-consuming  for  computers 
to  be  space  qualified,  this  leads  to  the  use  of  equipment  that  is  several  years  behind  the  commercial 
market  in  terms  of  performance  and  design  [  16], 
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Table  11-3.  Effort  Phase  Distribution  for  Semi-Detached  and  Embedded  Modes 

(Very  Urge  Project  >512  EKSLOC) 


Mode 

Phase 

Percentage 

Semi-detached 

Plans  and  Requirements 

7 

(Ground  Segment) 

Product  Design 

16 

Programming 

48 

System  Integration 

29 

Percentage  of  Effort  Accounted  by  CER 

64 

Percentage  of  Effort  Not  Accounted  by  CER 

36 

Embedded 

Plans  and  Requirements 

7 

(Flight) 

Product  Design 

17 

Programming 

44 

System  Integration  and  Test 

32 

Percentage  of  CSCI  Effort  Accounted  by  CER 

61 

Percentage  of  CSCI  Effort  Not  Accounted  by  CER 

39 

Source.  Reference  [6.  p.  90]. 

Note:  The  numbers  in  this  table  were  normalized  so  that  they  add  to  100%. 


An  additional  factor  of  40%  should  be  applied  to  the  cost  estimates  generated  by  the 
CERs  to  include  system  level  plans,  requirements,  integration  and  test  activities. 

I.  ISSUES 

We  intended  to  test  cost  effects  due  to  the  Ada  programming  language;  however, 
our  database  did  not  include  data  points  for  Ada  programs  with  all  the  development  phases 
included.  We  are  aware  of  research  in  this  area  [18],  but  have  not  encountered  space 
system  CERs  with  Ada  data  points. 

During  our  discussions  with  BMDO  officials,  the  question  often  asked  was  whether 
the  CERs  can  be  used  to  estimate  software  cost  for  BM/C3  systems.  The  data  points  in  our 
analysis  did  not  include  systems  of  this  nature.  Our  CERs  can  be  used  to  estimate 
embedded  flight  software  and  ground  segment  software  at  ground  entry  points  for  satellite 
systems.  BM/C3  systems  might  have  a  higher  degree  of  real-time  processing  than  our 
ground  segment  data  points. 
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HI.  SCHEDULE  ESTIMATION 


A.  INTRODUCTION 

Previous  research  has  addressed  the  question  of  how  long  it  takes  to  develop 
software  [6  and  13].  The  traditional  method  of  estimating  software  development  schedule  is 
to  derive  an  equation  that  uses  development  effort  (man-months)  as  the  single  independent 
variable.  Our  analysis  also  used  effort  specified  as  average  staffing  level  (man- 
month/duration),  and  included  a  second  variable,  software  size  (EKSLOC),  to  the  equation 
[19].  Our  approach  will  help  answer  the  question  of  how  much  program  duration  can  be 
shortened  with  added  staff  while  holding  project  size  (EKSLOC)  constant 

The  next  section  discusses  the  method  used  to  develop  TERs.  Section  C  describes 
the  database,  and  Section  D  presents  the  results.  Section  E  provides  factors  to  be  used  to 
account  for  activities  not  included  in  the  TERs. 

B.  METHOD 

We  followed  the  same  method  used  to  derive  the  CERs  to  develop  the  time¬ 
estimating  relationships.  Development  time  (duration),  measured  in  months  from  product 
design  through  CSCI  integration  and  test,  is  the  dependent  variable.  The  candidate 
schedule-driving  variables  are: 

•  Size  measured  in  thousands  of  equivalent  source  line  of  code  (EKSLOC). 

•  Average  staff  level  (AVG_MM)  measured  in  man-months  (total  man-months 
divided  by  duration). 

•  Embedded  flight  software  indicator  (FLT).  This  1/0  indicator  has  a  value  of  1 
for  embedded  flight  and  a  value  of  0  otherwise. 

•  Application  software  indicator  (APP).  This  1/0  indicator  has  a  value  of  1  for 
application  software  and  a  value  of  0  otherwise. 

•  Operating  system  software  indicator  (SYS).  This  1/0  indicator  has  a  value  of  1 
for  operating  system  software  and  a  value  of  0  otherwise. 

•  Support  software  indicator  (SUP).  This  1/0  indicator  has  a  value  of  1  for 
support  software  and  a  value  of  0  otherwise. 


C.  DATABASE 


Although  derived  from  the  same  sources  as  the  data  used  for  the  CER  analysis,  the 
data  sample  used  in  our  TER  analysis  contained  only  98  software  development  programs. 
These  data  points  were  mostly  from  the  Goddard  database  and  the  NASA  data  points  in  the 
SSCAG  database.  The  small  sample  is  due  to  the  limited  amount  of  software  CSCI 
schedule  data  for  military  systems  in  the  SSCAG  database.  We  used  only  the  data  points 
that  included  all  the  development  phases  considered  (product  design  through  CSCI 
integration  and  test). 

Originally,  our  TER  database  had  141  software  development  programs,  which  we 
categorized  according  to  space  mission  type  (manned  space  and  unmanned  space)  and  by 
basing  mode  (ground  segment  and  embedded  flight).  For  the  same  reasons  as  in  our  CER 
analysis  (BMDO’s  interest  in  unmanned  space  missions  and  our  residuals  analysis)  our 
database  was  reduced  from  141  to  98  data  points.  Of  the  98  data  points,  91  were  ground 
segment  and  7  were  embedded  flight  CSCIs. 

We  also  categorized  the  data  sample  by  software  type:  application,  support,  and 
operating  system.  The  ground-based  data  points  had  39  application  CSCIs,  SI  support 
CSCIs,  and  1  operating  system  CSCI.  All  embedded  flight  data  points  were  application 
software.  A  breakdown  of  our  database  is  shown  in  Table  HI-1.  Table  HI-2  shows  the 
averages  and  ranges  of  the  schedule  database. 


Table  111-1.  Sources  and  Numbers  of  Schedule  Data  Points  by  Software  Type 


Software  Type 
Ground  Segment 
Application 
Support 

Operating  System 
Total 

Embedded  Flight 
Application 
Support 

Operating  System 
Total 


_ SSCAG  Date _ 

DoD  Mission  NASA  Mission  Goddard  Data 


0  4  35 

0  25  26 

0  0  _ 1_ 

0  29  75 


2  3  2 

0  0  0 

0  0  _ 0 

2  3  2 


Total 


39 

51 

_1_ 

91 

7 

0 

0 


Table  111*2.  Schadula  Databasa  Averages  and  Rangaa 


Variable 

Basins  Mode 

Average 

Minimum 

Maximum 

EKSLOC 

Applications  System 

27.8 

1.5 

130 

EKSLOC 

Ground  Support 

27.6 

0.4 

192.7 

AVG _MM 

Applications  System 

6.4 

0.6 

16.3 

AVG_MM 

Ground  Support 

2.9 

0.2 

16.5 

D.  RESULTS 

Due  to  the  small  sample  size  of  the  embedded  flight  data  points,  we  pooled  the 
ground  segment  and  embedded  flight  data  points  for  our  regression.  We  tested  several 
different  specifications  in  developing  TERs  to  estimate  software  development  duration.  We 
analyzed  the  application  and  operating  system  software  separately  from  the  support 
software. 

1 .  Application  and  Operating  System  Software  TER 

Software  size,  staff  level,  and  basing  mode  proved  to  be  significant  explanatory 
variables,  as  shown  in  Equation  DI-1. 

Duration  =  7.2  x  EKSLOC  °  V  x  AVG_MM  48  x  2.9  ^  (III- 1 ) 

(7.9,  .000)  (5.4,  .000)  (6.9,  .000) 

N  =  47  Adjusted  R2  =  0.53  SEE  =  10.2  Intercept  Adjustment  =  1 .05 

The  t-scores  and  probability  levels  are  in  parentheses  below  the  parameter 
estimates.  N  is  the  number  of  observations. 

2.  Support  Software  TER 

The  same  explanatory  variables  were  significant  for  support  software  as  for 
application  and  operating  system  software. 

Duration  =  5.4  x  EKSLOC  076  x  AVGJvlM  ■*>.<*  (IU-2) 

(18.4,  .000)  (12.4,  .000) 

Ns  51  Adjusted  R2  «=  0.86  SEE  *  9.4  Intercept  Adjustment  =  1.004 

Equations  HI-1  and  HI-2  indicate  software  development  duration  decreases  at  a 
decreasing  rate  as  staff  size  increases  (as  denoted  by  the  negative  exponent  values  of  -.48 
and  -.68  on  average  staff  level).  This  is  due  to  the  inefficiencies  of  a  larger  staff  size 
discussed  in  [20]. 


Equation  IH-1  suggests  embedded  flight  software  takes  2.9  times  longer  to  develop 
(holding  size  and  staff  level  constant),  which  is  consistent  with  our  CER  findings.  Again, 
this  is  due  primarily  to  a  high  degree  of  real-time  processing,  ultra-high  reliability, 
interfaces  with  other  equipment  besides  computer  hardware,  computer  hardware 
obsolescence,  and  so  on  [16  and  17].  However,  the  data  in  our  database  show  that,  on 
average,  embedded  flight  software  takes  only  25%  longer  to  develop  because: 
(1)  embedded  flight  software  is  smaller  in  size  (67%  less)  than  ground  segment  software 
and  (2)  more  manpower  (25%  more)  is  assigned  to  develop  flight  software  than  ground 
segment  software.  Figure  HI- 1  depicts  estimates  of  software  CSCI  development  duration 
predicted  by  Equation  HI-1. 


Source  Lines  of  Code  (Thousands) 
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Figure  ill-1.  Estimates  of  Application  and  Operating  System  Software 

CSCI  Development  Duration 

Equation  III-2  suggests  that  support  software  takes  less  time  to  develop  than 
application  and  system  software.  As  expected,  due  to  the  low  degree  of  real-time 
processing,  the  support  software  develop:  ment  duration  decreases  at  a  higher  rate  than  the 
application  and  system  software  (-68  versus  -48)  with  the  application  of  additional  staff. 
Figure  III-2  depicts  estimates  of  software  CSCI  development  duration  predicted  by 
Equation  HI-2. 
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Source  Lines  of  Code  (Thousands) 
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Figure  111-2.  Estimates  of  Support  Software  CSCI  Development  Duration 

In  developing  our  models,  we  also  checked  for  the  presence  of  multicollinearity 
using  variance  inflation  factors  (VIFs).1  This  is  to  check  the  dependency  between  the  two 
variables  in  our  models,  software  size  and  staff  level.  The  application  software  has  a  VDF 
of  less  than  3.5,  and  the  support  software  has  a  VEF  of  less  than  2.5,  an  indication  of  no 
multicollinearity  in  the  models. 

E.  SCHEDULE  NOT  ACCOUNTED  FOR  BY  TERS 

Since  our  TERs  do  not  include  the  plans  and  requirements  and  system  integration 
and  test  phases,  the  factors  in  Table  HI-3  can  be  used  to  adjust  the  TERs. 

As  shown  in  the  table,  an  additional  45%  to  50%  factor  should  be  applied  to  the 
duration  estimate  generated  by  the  TER  to  include  system-level  plans,  requirements, 
integration,  and  test  activities. 


1  The  VIF  for  each  term  in  the  model  measures  the  combined  effect  of  the  dependencies  among  the 
regressors  on  the  variance  of  that  term.  One  or  more  large  VIFs  indicate  multicollinearity.  Practical 
experience  indicates  that  if  any  of  the  VIFs  exceed  5  or  10.  it  is  an  indication  that  the  associated 
regression  coefficients  are  poorly  estimatad  because  of  multicollinearity  [21]. 


Table  ill-3. 

Schedule  Phase  Distribution  for  Ssmi-Detachsd 
and  Embaddad  Modas 
(Vary  Larga  Projact  >512  EKSLOC) 

Mode 

Phase 

PercentaB 

Semi-detached 

Plans  and  Requirements 

19 

(Ground  Segment) 

Product  Design 

23 

Programming 

32 

System  Integration 

26 

Percentage  of  Effort  Accounted  by  TER 

55 

Percentage  of  Effort  Not  Accounted  by  TER 

45 

Embedded 

Plans  and  Requirements 

29 

(Flight) 

Product  Design 

27 

Programming 

23 

System  Integration  and  Test 

21 

Percentage  of  CSCI  Schedule  Accounted  by  TER 

50 

Percentage  of  CSCI  Schedule  Not  Accounted  by  TER 

50 

Source:  Reference  [6,  p.  90]. 
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IV.  CONCLUSION 


We  reached  the  following  conclusions  based  on  our  analysis  of  the  cost  database: 

•  User  type  was  a  cost  driver  for  ground-based  application  and  operating  system 
software,  but  not  for  support  software. 

•  DoD  software  development  costs  are  about  60%  higher  for  ground-based 
software  and  40%  higher  for  embedded  flight  software  than  NASA  software 
development  costs. 

•  Development  costs  for  both  ground-based  and  embedded  flight  application  and 
operating  system  software  increase  at  an  increasing  rate  with  size.  However,  as 
size  increases,  embedded  flight  software  development  costs  increase  at  a  much 
higher  rate  than  ground-based  software. 

•  Development  costs  for  military  embedded  flight  software  arc  on  average  five 
times  higher  than  ground  segment  software. 

•  Development  costs  for  ground-based  support  software  are  about  20%  to  25% 
lower  than  for  application  software.  Costs  also  increase  at  a  decreasing  rate 
with  size. 

•  Software  productivity  (measured  in  EKSLOC  per  man-month)  did  not  improve 
over  time  for  the  programs  studied  between  1977-1988.  However,  software 
functionality  has  increased  considerably  over  the  same  time  period. 

•  Software  language  cost  differences  could  not  be  quantified  in  the  CERs  for 
Ada  versus  non- Ada  software.  Our  databases  did  not  contain  any  Ada- 
language  data  points  that  included  all  the  development  phases. 

•  Embedded  flight  software  development  exhibits  more  diseconomies  of  scale 
than  does  ground  segment  software  development. 

Our  findings  concerning  the  schedule  database  were: 

•  Basing  mode  was  a  schedule  driver  for  application  software  development,  but 
not  for  support  software. 

•  Given  the  same  staffing  level  and  size,  embedded  flight  software  takes  almost 
three  times  longer  to  develop  than  ground-based  software  due  mainly  to 
stringent  reliability  requirements,  which  result  in  added  testing. 

4  Although  adding  more  staff  decreases  software  development  duration,  it  does 
so  at  a  decreasing  rate  because  inefficiency  is  a  by-product  of  larger  staff  size. 
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Because  future  weapon  systems  are  required  to  use  the  Ada  programming  language, 
historical  data  from  progiams  using  Ada  should  be  examined.  Our  study  did  not  include 
Ada  programs  but  could  be  updated  in  the  future  when  Ada  data  points  are  available.  A 
comparison  between  Ada  and  non- Ada  software  sizes  is  contained  in  [22]. 

This  study  addressed  space  mission  software  for  both  flight  and  ground-based 
software.  Another  type  of  software  that  is  critical  to  the  BMDO’s  mission  is  command, 
control,  and  communications  software.  Such  software  is  ground-based  and  involves  a  high 
degree  of  real-time  processing.  Our  study  did  not  include  software  of  this  nature.  Insights 
into  historical  data  for  programs  with  similar  characteristics  would  be  useful. 
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MODEL  APPLICATION 


Analysts  can  apply  the  estimating  relationships  presented  in  this  paper  in  two  ways. 
The  first  approach  is  to  use  the  models  in  their  role  as  an  assessment  tool.  The  second 
approach  is,  given  a  desired  model  output,  to  estimate  another  independent  variable  in  the 
model.  In  the  examples  that  follow,  we  present  the  application  of  the  models  using  the  first 
approach  for  the  CERs  and  both  approaches  for  the  TERs.  Before  applying  the  estimating 
relationships  for  the  software  development  schedule,  we  need  to  make  some  assumptions 
about  the  hypothetical  program  and  the  spacecraft  associated  with  it.  Our  hypothetical 
spacecraft,  the  SSD-1,  is  a  medium-sized  unmanned  surveillance  spacecraft  for  DoD,  it 
requires  13  CSCIs  to  carry  out  the  mission,  4  embedded  flight  CSCIs  for  on-board 
processing,  and  9  ground-based  CSCIs  for  the  system’s  ground-control  segment.  In 
addition,  4  ground-based  support  CSCIs  are  also  required.  The  aggregate  software  sizes 
are  125  KSLOC  for  embedded  flight  and  1,1 19  KSLOC  for  ground-based,  of  which  695 
KSLOC  are  support  software. 

SOFTWARE  DEVELOPMENT  COST 

Presented  in  Tables  A-l  and  A-2  are  the  estimates  of  the  software  costs  for  the 
SSD-1  program  for  both  ground  segment  and  embedded  flight.  The  total  estimates  include 
costs  not  accounted  for  by  the  models,  system  integration  and  test  (29%  for  the  ground 
segment  and  32%  for  the  embedded  flight),  and  plans  and  requirements  (7%  for  both 
ground  segment  and  embedded  flight).  Using  Equations  H-2  and  II-5,  respectively,  to 
calculate  development  effort  in  mau-months,  we  can  get  the  estimates  for  the  development 
cost  by  assuming  a  cost  of  $16,000  per  man-month. 
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Table  A-1.  SSD-1  Ground  Software  Cost  Estimates 


Basins  Mode 

CSCI  Name 

Software 

Type 

New 

SLQC  (K> 

Effort 

(MM) 

Cost  (K)  at 
S16K/MM 

Cost  per 
SLOC 

Ground 

Off-Line  Data  Processing 

S 

100 

429 

6,858 

69 

Ground 

Command  &  Status 

A 

30 

266 

4,254 

142 

Ground 

Communications 

A 

15 

126 

2,012 

134 

Ground 

Data  Processing 

A 

45 

412 

6,591 

146 

Ground 

Mission  Performance 

A 

70 

664 

10,622 

152 

Ground 

Telemetry  Processing 

A 

20 

172 

2,745 

137 

Ground 

Operator  Interface 

A 

100 

976 

15,613 

156 

Ground 

Environment  Interface 

A 

90 

871 

13,934 

155 

Ground 

Test  Support 

S 

80 

344 

5,511 

69 

Ground 

Simulator 

S 

400 

1.668 

26,683 

67 

Ground 

Configuration  Control 

A 

40 

363 

5,804 

145 

Ground 

Mission  Planning 

S 

115 

492 

7,865 

68 

Ground 

Signal  Processing 

A 

14 

117 

1,868 

133 

Ground  Segment  Total 

1.119 

6,898 

110,360 

99 

System  Level  Integration  (29%) 

32,004 

HMD  Total 

142,365 

127 

DEM/VAL  Phase  Cost:  Plans  &  Requirements  (7%) 

7,725 

Ground  Segment  Grand  Total 

150,090 

134 

Tabla  A-2.  Embedded 

Flight  Software  Coat  Estimates 

Software 

New 

Effort 

Cost  (K)  at 

Cost  per 

Basing  Mode  CSCI  Name 

Type 

SLOC  (K) 

(MM) 

S16K/MM 

SLOC 

Space  Operating  System 

A 

50 

3,601 

57,619 

1,152 

Space  Communications 

A 

35 

2,132 

34,108 

975 

Space  Diagnostics 

A 

25 

1,300 

20,799 

832 

Space  Signal  Processing 

A 

15 

613 

9.816 

654 

Space  Flight  Total 

125 

7,646 

122,342 

979 

System  Level  Integration  (32%) 

39,149 

EMD  Total 

161,491 

1,292 

DEM/VAL  Phase  Cost:  Plans  &  Requirements  (7%) 

8.564 

Space  Flight  Grand  Total 

170,055 

1,360 

A-2 


SOFTWARE  DEVELOPMENT  SCHEDULES 


For  the  time-estimating  relationships,  we  present  examples  of  two  applications: 
schedule  assessment,  and  staff  level  estimate.  Tables  A-3  and  A-4  illustrate  the  application 
of  our  TER  in  its  role  as  schedule  assessment  for  ground-segment  and  embedded  flight 
shown  in  Equation  HI-1.  The  estimate  will  include  the  two  phases  not  accounted  by  our 
TER,  plans  requirements  [19%  for  ground  segment  and  29%  for  embedded  flight  of  the 
engineering  and  manufacturing  development  (EMD)  schedule]  and  system  integration  and 
test  (26%  for  ground  segment  and  21%  for  embedded  flight).  Note  that  EMD  duration 
includes  development  time  accounted  for  by  our  TER  and  system  integration  and 
testing  duration. 


Table  A-3.  Ground-Baaed  Schedule  Estimates 


Basing  Mode 

CSCI  Name 

Software 
. Type  . 

New 

SLOC  (K) 

Staff 

Level 

Duration 

(months) 

Ground 

Off-Line  Data  Processing 

s 

100 

12 

33 

Ground 

Command  &  Status 

A 

30 

4 

36 

Ground 

Communications 

A 

15 

2 

32 

Ground 

Data  Processing 

A 

45 

6 

39 

Ground 

Mission  Performance 

A 

70 

9 

43 

Ground 

Telemetry  Processing 

A 

20 

3 

32 

Ground 

Operator  Interface 

A 

100 

15 

43 

Ground 

Environment  Interface 

A 

90 

13 

43 

Ground 

Test  Support 

S 

80 

6 

45 

Ground 

Simulator 

S 

400 

45 

39 

Ground 

Configuration  Control 

A 

40 

5 

39 

Ground 

Mission  Planning 

S 

115 

10 

42 

Ground 

Signal  Processing 

A 

14 

3 

25 

Ground  Segment  Total 

1,119 

133 

45 

System  level  Integration  (26%) 

12 

EMD  Duration 

56 

DEM/VAL  Phase  Cost:  Plans  &  Requirements  (19%) 

11 
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Table  A-4.  Embedded  Flight  Schedule  Eetimatee 


Basins  Mode  CS  Cl  Name 

Software 

Type 

New 

SLOCfK) 

Staff 

Level 

Duration 

(months) 

Space  Operating  System 

A 

50 

20 

68 

Space  Communications 

A 

35 

15 

62 

Space  Diagnostics 

A 

25 

10 

60 

Space  Signal  Processing 

A 

15 

6 

54 

Space  Flight  Total 

125 

51 

68 

System  Level  Integration  (21%) 

14 

EMD  Duration 

82 

DEM/VAL  Duration:  Plans  &  Requirements  (29%) 

24 

In  estimating  the  staff  level  required  given  a  fixed  development  duration,  we 
assume  one  CSCIs  for  each  software  type:  ground  application,  ground  support,  and  flight 
application.  In  order  to  use  our  model  to  estimate  the  average  staff  level,  we  need  to  specify 
when  the  software  is  required  for  the  program  to  proceed  on  schedule.  For  the  SSD-1 
program,  we  require  that  the  embedded  flight  software  be  completed  in  a  duration  of  68 
months.  For  the  ground-segment,  we  require  a  development  duration  of  45  months.  The 
question  is:  What  average  staff  level  do  we  need  to  support  the  software  development 
schedule? 

From  our  TERs  for  ground-based  support  software. 

Duration  =  5.4  (KESLOC)  °-76  (Average  Staff  Level)  -0  68, 
and  application  and  operating  system  software, 

Duration  =  7.2(KESLOC) 0  67  (Average  Staff  Level)  -0.48  (2.9)  flt 

we  solved  for  average  staff  level,  yielding: 

•  Ground-based  support  software: 

Average  Staff  Level  =  [5.4  (KESLOC)  ^/duration] 1/0  68 
=  [5.4  (695)  °-76/45]  147 
=  66.3 

•  Ground-based  application  and  operating  system  software: 

Average  Staff  Level  =  [7.2(  KESLOC)  ^/duration] 1/0  48 
=  [7.2(424)  0-67/45]  2  08 
=  102.2 


•  Embedded  flight  application  and  operating  system  software: 

Average  Staff  Level  =  [7.2  (KESLOC) 0  67  (2.9)  FLT/duration] 1/0  48 
=  [7.2  (125)  067  (2.9)0  )/68]  2  08 
=  71.7 

For  the  SSD-1,  where  the  three  developments  overlap,  a  combined  average  staff  of 
240.2  software  engineers  is  needed. 

There  are  more  ways  to  apply  schedule  assessment  problems  to  the  data  and 
analyses  presented  in  this  paper.  The  purpose  of  the  SSD-1  example  is  to  provide  examples 
so  that  BMDO  analysts  can  malr*  better  use  of  the  analyses  provided. 
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ANALYSIS  OF  OUTLIERS 

Frequently  in  regression  analysis  applications,  the  data  set  contains  some 
observations  that  are  outlying  or  extreme.  These  outliers  may  involve  large  residuals  and 
often  have  dramatic  effects  on  the  fitted  least  squares  regression  function.  However,  the 
fact  that  an  observation  is  an  outlier  (that  is,  an  observation  that  provides  a  large  residual 
when  the  chosen  model  is  fitted  to  the  data)  does  not  necessarily  mean  that  the  observation 
is  an  influential  (me  with  respect  to  the  fitted  equation.  It  is  important  to  study  the  outlying 
observations  carefully  and  decide  whether  they  should  be  retained  or  eliminated.  We  used 
three  statistics  to  help  identify  influential  data  points  that  are  outlying  with  respect  to  their  X 
or  Y  values:  Hat  Diagonal  Matrix,  RSTUDENT,  and  Cook’s  distance. 

HAT  DIAGONAL  MATRIX 

The  Hat  Diagonal  Matrix  is  used  to  identify  outlying  X  observations.  The  diagonal 
element  of  the  hat  matrix  (H  =  X(XX)’JX’)  defined  as  ht  =  X\  (X'X)~lXi  (where  Ap¬ 
pertains  to  the  ith  observation  and  X',is  the  ith  row  of  the  X  matrix  pertaining  to  the  ith 
observation)  is  called  the  leverage  (in  terms  of  the  X  values)  of  the  ith  observation.  It 
indicates  whether  or  not  the  X  values  for  the  ith  observation  are  outlying.  Each  ht  reflects 
the  influence  of  an  observed  data  point  on  the  fitted  value  Y(m.  A  large  value  /j,  indicates 
that  the  ith  observation  is  distant  from  the  center  of  the  X  observations.  If  the  ith  data  point 
is  an  outlying  X  data  point  (one  with  a  large  leverage  value  ht)  it  exercises  substantial 
leverage  in  determining  the  fitted  value  Yf.  A  leverage  value  hi  is  usually  considered  to  be 
large  if  it  is  more  than  twice  as  large  as  the  means  leverage  value  h  =  pM,  where  p  is  the 
number  of  regression  parameters  in  the  regression  function,  including  the  intercept  term. 
Leverage  values  greater  than  2 p/n  indicate  outlying  observations  that  may  have  undue 
influence  on  the  fit  of  the  regression  model  [23], 

RSTUDENT 

The  studentized  residual,  RSTUDENT,  is  used  to  detect  outlying  or  extreme  Y 
observations  based  on  an  examination  of  the  residuals.  When  the  residuals  et  have 
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substantially  different  variances  <t3(e|),  the  magnitude  of  et  relative  to  a (e,)  should  be 

considered  instead  of  the  regression  standard  error  estimate  ijMSE  to  give  recognition  of 
differences  in  their  sampling  errors.  The  residual  mean  square  MSE  is  the  residual  sum  of 
squares  SSE  divided  by  its  associated  degree  of  freedom  n-  2.  The  variance,  denoted  by 
or2^.)  =  0^(1  -h-),  has  an  unbiased  estimator  s2(e()  =  MSE(l  -h(),  where  e,  is  the 

A 

residual  Y(  -  Yi  and  /t,  is  the  leverage  value.  The  ratio  of  e,  to  s(et)  is  called  the 
“studentized  residual.”  It  is  denoted  by: 

e. 

£  *  _.  - * - 

1  s(e.) 


When  the  ith  observation  is  deleted,  the  regression  function  is  fitted  to  the  remaining  n  -  1 
observations,  and  the  point  estimate  of  the  expected  value  when  the  X  levels  are  those  of 

A 

the  ith  observation,  denoted  by  Y(i ),  will  be  compared  with  the  actual  7,  observed  value. 

A 

The  residual  <f,-  as  JJ  -  Y(i)  is  called  a  “deleted  residual.”  Thus,  the  studentized  deleted 
residual  denoted  by  d*  is: 


d*  = 


However,  the  studentized  deleted  residuals  df  can  be  calculated  without  having  to  fit  the 
regression  function  with  the  ith  observation  omitted.  An  algebraically  equivalent  expression 
for  d*  is: 


d.*  = 


e. 

I 


n-F-1 

SSE  (l  -h)-e2 

I  I 


Note  that  the  studentized  deleted  residual  d*  can  be  calculated  from  the  residual  e(. 
the  sum  of  squares  SSE,  and  the  leverage  value  hit  all  for  the  fitted  regression  based  on  the 
n  observations. 

To  identify  outlying  Y  observations,  we  examine  the  studentized  deleted  residuals 
for  large  absolute  values  and  use  the  appropriate  r  distribution  to  ascertain  how  far  in  the 
tails  such  outlying  values  fall.  The  typical  criterion  for  screening  is  to  use  2.0  for 
RSTUDENT  value.  Data  points  with  RSTUDENT  value  greater  than  2.0  would  be 
coaside  red  influential  outliers  [24  and  25]. 
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COOK’S  DISTANCE  MEASURE 


The  Cook’s  distance  measures  the  change  in  estimated  regression  coefficient  vector 
caused  by  deletion  of  an  observation;  that  is,  the  difference  between  the  vector  b  of  the 
estimated  regression  coefficients  based  on  all  n  observations  and  the  vector  bi  based  on  the 
n  -  1  observations  with  the  ith  observation  deleted  [24].  The  Cook’s  distance  measure  Di 
uses  the  boundary  of  the  confidence  region  for  all  p  regression  coefficients  /?*(£  =  0, .... 

p  -  1)  given  by  ^  ~  Pjj  jj.  ~  &  _  ^  cr,p,n-  p)  f°r  measuring  the  combined 

impact  of  the  differences  in  the  estimated  regression  coefficients  when  the  ith  observation  is 
deleted; 

(*-V  rw-y 

r  pMSE 

Dx  can  be  evaluated  by  comparing  it  with  an  appropriate  F  distribution.  Although  D, 
does  not  follow  exactly  an  F  distribution,  it  has  been  found  to  be  approximately  in  the  tail 
area  probability  of  the  corresponding  F  distribution.  To  assess  the  magnitude  of  Z>, ,  one 
should  refer  to  the  corresponding  F(p,  n  -  p)  distribution  and  ascertain  the  tail  area 
probability.  If  the  tail  probability  of  the  F  distribution  is  beyond  the  90th  percentile,  the 
distance  between  the  vector  b  and  b ^  should  be  considered  large,  meaning  the  ith 
observation  has  a  substantial  influence  on  the  fit  of  the  regression  function. 

Cook’s  distance  measure  Dj  can  be  calculated  without  fitting  a  new  regression 
function  where  the  ith  observation  is  deleted.  An  algebraically  equivalent  expression 
is  [23]: 

e*  h 
n  -  — ! — x - '■ — 

'  PMSE  (l-hf 

Note  that  D,  depends  on  two  factors:  (1)  the  size  of  the  residual  e,  and  (2)  the 
leverage  value  h(  The  larger  e,  or  h,-  is,  the  larger  D,  is.  Thus,  the  ith  observation  can  be 
influential:  (1)  by  having  a  large  residual  e,  and  only  a  moderate  leverage  hh  or  (2)  by 
having  a  large  leverage  value  /»,  with  only  a  moderately  sized  residual  e,,or  (3)  by  having 
both  a  large  residual  e,  and  a  large  leverage  value  /»,. 

While  analysis  of  outlying  and  influential  observations  is  a  necessary  component  of 
good  regression  analysis,  it  is  neither  automatic  nor  foolproof  and  requires  good  judgment 
by  the  analyst  [23]. 
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ABBREVIATIONS 


ABBREVIATIONS 


BM/C3 

Battle  Management  and  Command,  Control,  and  Communications 

BMDO 

Ballistic  Missile  Defense  Organization 

CARD 

Cost  Analysis  Requirements  Document 

CER 

cost- estimating  relationship 

CSCI 

computer  software  configuration  item 

DoD 

Department  of  Defense 

ESLOC 

equivalent  source  lines  of  code 

EMD 

engineering  and  manufacturing  development 

IDA 

Institute  for  Defense  Analyses 

JPL 

Jet  Propulsion  Laboratory 

NASA 

National  Aeronautics  and  Space  Administration 

POET 

Phase  One  Engineering  Team 

RSTUDENT 

Studentized  Residual 

SDIO 

Strategic  Defense  Initiative  Organization 

SLOC 

source  lines  of  code 

SMC 

Space  and  Missile  Systems  Center 

SSCAG 

Space  Systems  Cost  Analysis  Group 

TER 

time-estimating  relationship 

VIF 

variance  inflation  factor 

D-l 


UNCLASSIFIED 


REPORT  DOCUMENTATION  PAGE 

Form  Approved 

OMB  No,  0704-01 B8 

fteiompoitlngbM><»nloHH»m»«iWhwo)MonwMl8nto— 1  hour  p»f  mparm,  Indudwg  th»  m»  1<x  raviwwng  tmuotone,  Morehine  wMng  ditt  ooukim,  sothorino  tnd 
wolmoWng  Iho  am  noodod,  ond  oamplrtng  and  mowing  do  cd.oUon  d  WorwDcn.  Bond  ecmmoMo  regarding  Mo  burdon  ooUmolo  or  any  othar  atpaoi  d  *m  oodaotem  of  Irtiormtkon. 
mmm^auOBatMonalorioduolwottilahodan  loWaahlnplolsHoadQUonatiaatvtooa  rHnlmlt  tnr  tnf-mutirm 'Tnriatrm  «rt1  nrrnrn  12 1 6  Jaflareon  Otv*  Highway.  SuMa  1294,  AiUngWo,  VA 
2220-4302,  and  »«w  OMoa  <2  Man  roam  aw  and  Budoat.  Pwanaork  Raduotfon  Protael  (0704-0114).  WaaNnalen.  DC  20603. 

1.  AGENCY  USE  ONLY  (!*•*»  3.  REPORT  DATE  3.  REPORT  TYPE  A 

May  1994  Final  Retx> 

NO  OATES  COVERED 

rt.  Oct  1992 -Apr  1994 

4.  TITLE  ANO  SUBTITLE 

Estimating  Software  Development  Costs  and  Schedules  for  Space 
Systems 

5.  FUNDING  NUMBERS 

MDA903  89C  0003 
T-R2-597.12 

6.  AUTHORO) 

James  Bui,  Neang  I.  Om 

7.  PERFORMINQ  ORGANIZATION  NAMEfS)  AND  ADDRESSES) 

Institute  for  Defense  Analyses 

1801  N.  Beauregard  Street 

Alexandria,  VA  2231 1-1772 

S.  PERFORMING  ORGANIZATION 

REPORT  NUMBER 

IDA  Paper  P-2830 

0.  SPONSORtNG/MONITORINa  AGENCY  NAME(S}  ANO  AOORESS(EB) 

BMDO/AQP 

Room  1E1044,  The  Pentagon 

Washington,  DC.  20301 

10.  SPONSORING/MONITORING 

AGENCY  REPORT  NUMBER 

11.  SUPPLEMENTARY  NOTES 

11A.  DtSTRIBUTION/AVAILABILiTY  STATEMENT 

Approved  for  public  release;  distribution  unlimited. 

12B.  DISTRIBUTION  CODE 

13.  ABSTRACT  (MudmurMOO  wordt) 

This  paper  analyzes  historical  software  development  costs  and  schedules  from  military  and  NASA 
satellite  systems.  Cost-  and  time-estimating  relationships  were  developed  for  ground  segment  and 
embedded  flight  software.  In  addition  to  the  traditional  size  variable,  other  cost  drivers  were  found: 
software  residence  (ground  or  flight),  and  software  type  (application  and  support).  Schedule  driving 
variables  included  size,  staffing  level,  and  residence.  The  equations  developed  from  the  study  can  be 
used  to  estimate  future  space  system  software  development  costs  or  to  cross-check  estimates  generated 
by  other  methods. 

14.  SUBJECT  TERMS 

Cost  Estimating  Relationships,  Time  Estimating  Relationships,  Space  Systems, 
Computer  Programs 

16.  PRICE  ccoe 

17.  SECURITY  CLASSIFICATION  11.  SECURITY  CLASSIFICATION  19  SECURITY  CLASSFICATION 

OF  REPORT  OF  THIS  PAGE  OF  ABSTRACT 

Unclassified  Unclassified  Unclassified 

20.  LIMITATION  OF 

ABSTRACT 

SAR 

NSN  764001  -280-5600  Standard  Form  298  (Rev.  2-89) 

Pmcrttwd  by  ANSI  Std.  Z39-1  S 
2M-102 


UNCLASSIFIED 


